The Significance Levels (for the Explanatory Variables) vs. Beta-Weights

To keep these distinct in your mind, link “significance levels” with the word “individual,” and link “beta-weights” with “population.” The significance level for an explanatory variable tells you whether you’d like to take this variable into account when making a prediction for an individual, i.e., whether you’d like to know the value of this variable for that individual. The beta-weight (when compared to other beta-weights) of an explanatory variable tells you how important this variable is in helping to explain why the dependent variable takes varying values across the population.

Example: The Sunday edition of most metropolitan newspapers usually contains a “Homes and Living” section. A standard feature is the “add something to your home” article, which focuses on adding something such as an outdoor hot-tub to your property.

The article will include contractors’ estimates of the cost of the addition as well as maintenance expenses, and interviews with a number of homeowners who’ve invested in such an addition concerning their feelings after the fact. This information, combined with your career/moving plans and personal preferences, lets you estimate the initial capital investment, the income stream (maintenance costs netted against consumption value), and the duration of the investment if you were to add such a feature. All that remains in order to evaluate the investment decision is the liquidation value of the investment, i.e., the increment to the market value of your home that such an addition would yield. This incremental value is the final piece of information the article will offer, and is typically the result of a regression analysis.

In order to estimate the effect on market value of, for example, the addition of an outdoor hot-tub on the property, we’d want to use the most complete regression model available (including a “yes/no” variable which is 1 if a hot-tub is present on a property, and 0 otherwise). Imagine the result of estimating the coefficients of such a model in two different markets, Winnetka, Illinois, and Palo Alto, California.

We might well find that the coefficient of the hot-tub variable is $5000 in both markets, and (depending on the sample sizes of the studies) that the standard error of the coefficient in each market is $500. This yields a t-ratio of 10.0 in both markets, and a significance level for the hot-tub variable very near 0%: Both studies would then provide strong evidence that the hot-tub variable belongs in the model, and that a property assessor should note whether a hot-tub is present or not in appraising the market value of any individual home.

At the same time, it is likely that the beta-weight of the hot-tub variable would be relatively small when compared with the beta-weights of other explanatory variables in the Winnetka study, and relatively large in the Palo Alto study. In Winnetka, residential lot sizes vary substantially, as does the interior living space in homes. Distance from the lake is another important factor in explaining why some houses sell for much more than others. And fewer than one home in a hundred has a hot-tub on the property (since the weather is too cold to use an outdoor hot-tub for much of the year). If someone asked you, “Why do property values vary in the Winnetka market?” you’d offer many explanations – lot sizes and interior living space vary across the population of homes, as does distance from the lake – before you’d answer, “ and some houses have hot-tubs in the yard, while others don’t.” On the other hand, Palo Alto is a newer, more-heavily-zoned community. Lot sizes and interior living space don’t vary much from one home to the next, there is no “serious” lake in the area, and, as a result, the range of housing market values is much narrower than in Winnetka. However, roughly half the homes have hot-tubs outside. If asked, “Why do housing prices vary in Palo Alto?” one of your first answers would be, “Well, some have hot-tubs and some don’t.” In both cases, the relative size of the beta-weight of the hot-tub variable indicates the relative importance of variation in that variable – 1 for some homes, 0 for others – in helping to explain why market values of homes vary across the population of homes in the town.

The beta-weight of each explanatory variable in a model is proportional to the product of the regression coefficient and the standard deviation of that explanatory variable (that is, a beta-weight combines the effect of a variable with the extent to which the effect is exerted). Even if lot size has the same coefficient in both studies, the standard deviation of lot size is smaller, and the standard deviation of the hot-tub variable is larger, in Palo Alto than in Winnetka. This is why the beta-weight of the lot-size variable might be greater than the beta-weight of the hot-tub variable in the Winnetka study, while the reverse is true in Palo Alto.

If an explanatory variable varies substantially within the sample, this variation will increase its beta-weight. At the same time, this variation makes it easier to estimate the effect of the variable accurately, i.e., it lessens the standard error of the coefficient and therefore increases the t-ratio. Consequently, in many regressions the significance levels and beta-weights vary in opposite directions, especially if the sample size is small. The above example illustrates that this will not always be the case, and that the significance levels and beta-weights do tell different stories.

The significance level for an explanatory variable in a regression model answers a marginal question: "After all the other variables in the current model are already taken into account, how strong is the evidence that this variable contributes something of additional relevance?" Adding a new variable which increases the adjusted coefficient of determination certainly increases the apparent explanatory power of the model. But merely adding a totally irrelevant explanatory variable will, about half the time, increase the adjusted coefficient of determination by at least a little bit. The significance level for the new variable tells us if the addition of that new variable to the model increases the adjusted coefficient of determination by enough to warrant serious attention.